class: center, middle, inverse, title-slide # Introduction to Survey Data Cleaning Using Tidyverse in R ## Introduction ### Johannes Breuer
Stefan Jünger ### 2021-07-22 --- layout: true <div class="my-footer"> <div style="float: left;"><span>Johannes Breuer, Stefan Jünger</span></div> <div style="float: right;"><span>ESRA 2021, 2021-07-22</span></div> <div style="text-align: center;"><span>Introduction</span></div> </div> --- ## About us ### Johannes Breuer .small[ - Senior researcher in the team Data Augmentation, Department Survey Data Curation, [*GESIS - Leibniz Institute for the Social Sciences*](https://www.gesis.org/en/home), Cologne, Germany - (Co-)Leader of the team Research Data & Methods at the [*Center for Advanced Internet Studies*](https://www.cais.nrw/en/center-for-advanced-internet-studies-cais-en/) (CAIS), Bochum, Germany - Main areas: - digital trace data for social science research - data linking (surveys + digital trace data) - Ph.D. in Psychology, University of Cologne - Previously worked in several research projects investigating the use and effects of digital media (Cologne, Hohenheim, Münster, Tübingen) - Other research interests - Computational methods - Data management - Open science [johannes.breuer@gesis.org](mailto:johannes.breuer@gesis.org), [@MattEagle09](https://twitter.com/MattEagle09), [personal website](https://www.johannesbreuer.com/) ] --- ## About us ### Stefan Jünger .pull-left[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\stefan.png" width="50%" style="display: block; margin: auto;" /> ] .pull-right[ - Postdoctoral researcher in the team Data Augmentation at the GESIS department Survey Data Curation - Ph.D. in social sciences, University of Cologne ] - Research interests: - quantitative methods & Geographic Information Systems (GIS) - social inequalities & attitudes towards minorities - data management & data privacy - reproducible research .small[ [stefan.juenger@gesis.org](mailto:stefan.juenger@gesis.org) | [@StefanJuenger](https://twitter.com/StefanJuenger) | [https://stefanjuenger.github.io](https://stefanjuenger.github.io) ] --- ## About you Please use the text chat to introduce yourself: - What's your name? - Where do you work? - What do you work on? - What are your experiences with `R` and the `tidyverse`? - What are your motivations for joining this course? What are your expectations for this course? --- ## Prerequisites for this course .large[ - Working versions of `R` and *RStudio* - Some basic knowledge of `R` - The `tidyverse` packages ] --- ## Workshop Structure & Materials - The workshop consists of a combination of short lectures and hands-on exercises - Slides and other materials are available at .center[`https://github.com/jobreu/tidyverse-workshop-esra-2021`] --- ## Course schedule <table> <thead> <tr> <th style="text-align:center;"> When? </th> <th style="text-align:center;"> What? </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 13:00 - 13:20 </td> <td style="text-align:center;"> Introduction: Welcome to the tidyverse </td> </tr> <tr> <td style="text-align:center;"> 13:20 - 13:30 </td> <td style="text-align:center;"> Exercise 1 </td> </tr> <tr> <td style="text-align:center;"> 13:30 - 13:45 </td> <td style="text-align:center;"> Data Import </td> </tr> <tr> <td style="text-align:center;"> 13:45 - 14:00 </td> <td style="text-align:center;"> Exercise 2 </td> </tr> <tr> <td style="text-align:center;"> 14:00 - 14:30 </td> <td style="text-align:center;"> Data Wrangling - Part 1 </td> </tr> <tr> <td style="text-align:center;"> 14:30 - 14:45 </td> <td style="text-align:center;"> Exercise 3 </td> </tr> <tr> <td style="text-align:center;"> 14:45 - 15:00 </td> <td style="text-align:center;"> <i>Coffee break</i> </td> </tr> <tr> <td style="text-align:center;"> 15:00 - 15:30 </td> <td style="text-align:center;"> Data Wrangling - Part 2 </td> </tr> <tr> <td style="text-align:center;"> 15:30 - 15:45 </td> <td style="text-align:center;"> Exercise 4 </td> </tr> <tr> <td style="text-align:center;"> 15:45 - 16:00 </td> <td style="text-align:center;"> Wrap-Up </td> </tr> </tbody> </table> --- ## Online format - If possible, we invite you to turn on your camera - If you have an immediate question during the lecture parts, please send it via text chat - Public or private (ideally to the person currently not presenting if you want an immediate response) - If you have a question that is not urgent and might be interesting for everybody, you can also use audio (& video) to ask it during the exercise parts - We would also kindly ask you to mute your microphones when you are not asking a question or engaging in discussions or group work --- ## What is the `tidyverse`? > The `tidyverse` is an .highlight[opinionated collection of R packages designed for data science]. All packages share an .highlight[underlying design philosophy, grammar, and data structures] ([Tidyverse website](https://www.tidyverse.org/)). > The `tidyverse` is a .highlight[coherent system of packages for data manipulation, exploration and visualization] that share a .highlight[common design philosophy] ([Rickert, 2017](https://rviews.rstudio.com/2017/06/08/what-is-the-tidyverse/)). <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\hex-tidyverse.png" width="25%" style="display: block; margin: auto;" /> --- ## Benefits of the `tidyverse` .large[ Most of the things we are going to show you can also be achieved with base `R`. However, the syntax for this is typically (more) verbose and not intuitive and, hence, difficult to learn, remember, and read (plus many `tidyverse` operations are faster than their base R equivalents). ] --- ## Benefits of the `tidyverse` .large[ `Tidyverse` syntax is designed to increase **human-readability**. This makes it especially **attractive for `R` novices** as it can facilitate the experience of **self-efficacy** (see [Robinson, 2017](http://varianceexplained.org/r/teach-tidyverse/)). The `tidyverse` also aims for **consistency** (e.g., data frame as first argument and output) and uses **smarter defaults** (e.g., no partial matching of data frame and column names). ] --- ## `tidyverse` for `R` beginners ```r meme_get("DistractedBf") %>% meme_text_distbf("tidyverse", "new R users", "base R") ``` <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\DistractedBf.png" width="60%" style="display: block; margin: auto;" /> --- ## Workflow .center[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\data-science.png" width="60%" style="display: block; margin: auto;" /> ] <small><small>Source: http://r4ds.had.co.nz/</small></small> .highlight[- **Import**: read in data in different formats (e.g., .csv, .xls, .sav, .dta) - **Tidy**: clean data (1 row = 1 case, 1 column = 1 variable), rename & recode variables, etc. - **Transform**: prepare data for analysis (e.g., by aggregating and/or filtering)] - **Visualize**: explore/analyze data through informative plots - **Model**: analyze the data by creating models (e.g, linear regression model) - **Communicate**: present the results (to others) --- ## `Tidyverse` workflow .center[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\tidyverse-1200x484.png" width="1600" style="display: block; margin: auto;" /> ] <small><small>Source: http://www.storybench.org/getting-started-with-tidyverse-in-r/</small></small> --- ## Lift-off into the `tidyverse` 🚀 **Install all `tidyverse` packages** (for the full list of `tidyverse` packages see [https://www.tidyverse.org/packages/](https://www.tidyverse.org/packages/)) ```r install.packages("tidyverse") ``` **Load core `tidyverse` packages** (NB: To save time and reduce namespace conflicts it can make sense to load the `tidyverse` packages individually) .small[ ```r library("tidyverse") ``` ] --- ## Data for this workshop For this workshop, we will use a synthetic data set based on the data from the [*GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany*](https://search.gesis.org/research_data/ZA5667). This synthetic data set was created by [Bernd Weiß](https://berndweiss.net/) using the [`synthpop` package](https://www.synthpop.org.uk/). Original data set: GESIS Panel Team (2020). *GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany*. GESIS Data Archive, Cologne. ZA5667 Data file Version 1.1.0, [https://doi.org/10.4232/1.13520](https://doi.org/10.4232/1.13520) --- ## `tidyverse` vocab 101 We will focus on three key things here: 1. Tidy data 2. Tibbles 3. Pipes --- ## Tidy data The 3 rules of tidy data: 1. Each **variable** is in a separate **column**. 2. Each **observation** is in a separate **row**. 3. Each **value** is in a separate **cell**. <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\tidy_data.png" width="2560" style="display: block; margin: auto;" /> Source: https://r4ds.had.co.nz/tidy-data.html *NB*: In the `tidyverse` terminology 'tidy data' usually also means data in long format (where applicable). --- ## Wide vs. long format <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\wide-long.png" width="90%" style="display: block; margin: auto;" /> Source: https://github.com/gadenbuie/tidyexplain#tidy-data --- ## Tibbles all the way .pull-left[ Tibbles are basically just `R data.frames` but nicer. - only the first ten observations are printed - output is tidier! - you get some additonal metadata about rows and columns you'd usually only get when, e.g., using `dim()` and other functions Please refer to this [vignette](https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html) for the technical details. ] .pull-right[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\tibble.png" width="60%" style="display: block; margin: auto;" /> .center[[Source](https://github.com/tidyverse/tibble/blob/master/man/figures/logo.png)] ] --- ## A `data.frame` .tiny[ ``` ## cohort sex age_cat education_cat intention_to_vote choice_of_party political_orientation marstat ## 1 2 1 10 3 2 98 7 1 ## 2 1 2 2 3 2 5 3 2 ## 3 1 1 8 1 2 2 7 1 ## 4 2 2 1 3 2 98 1 2 ## 5 3 2 7 3 2 5 2 2 ## 6 2 2 7 2 2 1 2 1 ## 7 1 2 7 3 2 5 3 1 ## 8 2 1 7 3 NA NA 3 1 ## 9 2 2 8 3 2 98 3 1 ## household hzcy001a hzcy002a hzcy003a hzcy004a hzcy005a hzcy006a hzcy007a hzcy008a hzcy009a hzcy010a ## 1 2 4 6 3 6 4 1 1 0 0 0 ## 2 2 4 6 6 6 4 1 1 0 0 1 ## 3 2 2 2 2 2 2 1 1 1 0 0 ## 4 3 NA NA NA NA NA NA NA NA NA NA ## 5 2 6 6 4 6 6 1 0 0 0 0 ## 6 2 4 4 3 4 4 1 1 0 0 0 ## 7 1 4 4 3 4 4 1 1 1 0 0 ## 8 3 NA NA NA NA NA NA NA NA NA NA ## 9 2 NA NA NA NA NA NA NA NA NA NA ## hzcy011a hzcy012a hzcy013a hzcy014a hzcy015a hzcy016a hzcy018a hzcy019a hzcy020a hzcy021a hzcy022a ## 1 1 0 1 1 0 0 0 4 4 4 4 ## 2 1 1 0 1 0 0 0 5 5 5 5 ## 3 1 1 0 1 0 0 0 5 5 5 5 ## 4 NA NA NA NA NA NA NA NA NA NA NA ## 5 1 1 0 1 0 0 0 4 4 4 4 ## 6 1 1 0 1 0 0 0 NA 4 4 4 ## 7 1 1 0 1 0 0 0 5 5 5 5 ## 8 NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA ## hzcy023a hzcy024a hzcy025a hzcy026a hzcy027a hzcy028a hzcy029a hzcy030a hzcy031a hzcy032a hzcy033a ## 1 4 2 2 1 4 2 3 3 3 3 NA ## 2 5 5 5 1 5 2 5 5 5 5 NA ## 3 5 5 5 1 5 3 5 5 5 5 NA ## 4 NA NA NA NA NA NA NA NA NA NA NA ## 5 4 4 4 1 3 2 4 4 4 4 NA ## 6 5 2 2 1 4 3 4 4 4 4 NA ## 7 5 5 5 1 5 4 5 5 5 5 NA ## 8 NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA ## hzcy034a hzcy035a hzcy036a hzcy037a hzcy038a hzcy039a hzcy040a hzcy041a hzcy042a hzcy043a hzcy044a ## 1 NA NA NA NA NA NA 2 5 2 2 2 ## 2 NA NA NA NA NA NA 2 3 2 3 4 ## 3 NA NA NA NA NA NA 2 2 2 2 4 ## 4 NA NA NA NA NA NA NA NA NA NA NA ## 5 NA NA NA NA NA NA 3 3 3 3 5 ## 6 NA NA NA NA NA NA 3 3 2 3 4 ## 7 NA NA NA NA NA NA 2 2 1 3 4 ## 8 NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA ## hzcy045a hzcy046a hzcy047a hzcy048a hzcy049a hzcy050a hzcy051a hzcy052a hzcy053a hzcy054a hzcy055a ## 1 4 4 5 5 5 5 5 5 5 NA NA ## 2 5 5 5 5 5 5 5 5 1 0 0 ## 3 4 4 4 4 4 4 4 4 1 0 0 ## 4 NA NA NA NA NA NA NA NA NA NA NA ## 5 4 4 4 4 4 4 4 5 1 0 0 ## 6 4 4 5 4 4 4 4 98 1 0 0 ## 7 3 3 4 4 4 4 4 4 1 0 0 ## 8 NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA ## hzcy056a hzcy057a hzcy058a hzcy059a hzcy060a hzcy061a hzcy062a hzcy063a hzcy064a hzcy065a hzcy066a ## 1 NA NA NA NA NA NA NA NA NA NA NA ## 2 0 0 0 0 1 NA NA NA NA NA NA ## 3 0 0 0 0 1 NA NA NA NA NA NA ## 4 NA NA NA NA NA NA NA NA NA NA NA ## 5 0 0 0 0 1 NA NA NA NA NA NA ## 6 0 0 0 0 1 NA NA NA NA NA NA ## 7 1 0 0 0 0 NA NA NA NA NA NA ## 8 NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA ## hzcy067a hzcy068a hzcy069a hzcy070a hzcy071a hzcy072a hzcy073a hzcy074a hzcy075a hzcy076a hzcy077a ## 1 NA NA NA NA 2 NA NA NA NA NA NA ## 2 NA NA NA NA 2 NA NA NA NA NA NA ## 3 NA NA NA NA 2 NA NA NA NA NA NA ## 4 NA NA NA NA NA NA NA NA NA NA NA ## 5 NA NA NA NA 2 NA NA NA NA NA NA ## 6 NA NA NA NA 2 NA NA NA NA NA NA ## 7 NA NA NA NA 2 NA NA NA NA NA NA ## 8 NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA ## hzcy078a hzcy079a hzcy080a hzcy081a hzcy083a hzcy084a hzcy085a hzcy086a hzcy087a hzcy088a hzcy089a ## 1 NA NA NA NA NA 1 1 0 1 1 0 ## 2 NA NA NA NA NA 1 0 1 0 0 0 ## 3 NA NA NA NA NA 1 1 0 0 0 1 ## 4 NA NA NA NA NA NA NA NA NA NA NA ## 5 NA NA NA NA NA 1 0 1 0 0 0 ## 6 NA NA NA NA NA 1 1 0 1 0 1 ## 7 NA NA NA NA NA 1 0 1 0 0 0 ## 8 NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA ## hzcy090a hzcy091a hzcy092a hzcy093a hzcy095a hzcy096a hzcy097a hzcy098a hzcy099a hzza003a hzzq009a ## 1 0 0 1 0 0 NA NA NA NA 1 5 ## 2 0 0 0 0 0 NA NA NA NA 1 5 ## 3 0 0 0 0 0 NA NA NA NA 1 4 ## 4 NA NA NA NA NA NA NA NA NA 0 NA ## 5 0 0 1 0 0 NA NA NA NA 1 4 ## 6 0 1 0 0 0 NA NA NA NA 1 4 ## 7 1 0 1 0 0 1 1 0 1 1 4 ## 8 NA NA NA NA NA NA NA NA NA 0 NA ## 9 NA NA NA NA NA NA NA NA NA 0 NA ## hzzq023a hzzp201a hzzp204a hzzp207a ## 1 5 31 651 1585223562 ## 2 5 31 336 1584510380 ## 3 4 31 405 1585348329 ## 4 NA NA NA NA ## 5 5 31 411 1584468409 ## 6 4 31 443 1584968090 ## 7 4 31 412 1585408051 ## 8 NA NA NA NA ## 9 NA NA NA NA ## [ reached 'max' / getOption("max.print") -- omitted 3756 rows ] ``` ] --- ## A `tibble` .tiny[ ``` ## # A tibble: 3,765 x 111 ## cohort sex age_cat education_cat intention_to_vote choice_of_party political_orienta~ marstat household ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 2 1 10 3 2 98 7 1 2 ## 2 1 2 2 3 2 5 3 2 2 ## 3 1 1 8 1 2 2 7 1 2 ## 4 2 2 1 3 2 98 1 2 3 ## 5 3 2 7 3 2 5 2 2 2 ## 6 2 2 7 2 2 1 2 1 2 ## 7 1 2 7 3 2 5 3 1 1 ## 8 2 1 7 3 NA NA 3 1 3 ## 9 2 2 8 3 2 98 3 1 2 ## 10 2 2 6 2 NA 98 5 1 2 ## # ... with 3,755 more rows, and 102 more variables: hzcy001a <dbl>, hzcy002a <dbl>, hzcy003a <dbl>, ## # hzcy004a <dbl>, hzcy005a <dbl>, hzcy006a <dbl>, hzcy007a <dbl>, hzcy008a <dbl>, hzcy009a <dbl>, ## # hzcy010a <dbl>, hzcy011a <dbl>, hzcy012a <dbl>, hzcy013a <dbl>, hzcy014a <dbl>, hzcy015a <dbl>, ## # hzcy016a <dbl>, hzcy018a <dbl>, hzcy019a <dbl>, hzcy020a <dbl>, hzcy021a <dbl>, hzcy022a <dbl>, ## # hzcy023a <dbl>, hzcy024a <dbl>, hzcy025a <dbl>, hzcy026a <dbl>, hzcy027a <dbl>, hzcy028a <dbl>, ## # hzcy029a <dbl>, hzcy030a <dbl>, hzcy031a <dbl>, hzcy032a <dbl>, hzcy033a <dbl>, hzcy034a <dbl>, ## # hzcy035a <dbl>, hzcy036a <dbl>, hzcy037a <dbl>, hzcy038a <dbl>, hzcy039a <dbl>, hzcy040a <dbl>, ## # hzcy041a <dbl>, hzcy042a <dbl>, hzcy043a <dbl>, hzcy044a <dbl>, hzcy045a <dbl>, hzcy046a <dbl>, ## # hzcy047a <dbl>, hzcy048a <dbl>, hzcy049a <dbl>, hzcy050a <dbl>, hzcy051a <dbl>, hzcy052a <dbl>, ## # hzcy053a <dbl>, hzcy054a <dbl>, hzcy055a <dbl>, hzcy056a <dbl>, hzcy057a <dbl>, hzcy058a <dbl>, ## # hzcy059a <dbl>, hzcy060a <dbl>, hzcy061a <dbl>, hzcy062a <dbl>, hzcy063a <dbl>, hzcy064a <dbl>, ## # hzcy065a <dbl>, hzcy066a <dbl>, hzcy067a <dbl>, hzcy068a <dbl>, hzcy069a <dbl>, hzcy070a <dbl>, ## # hzcy071a <dbl>, hzcy072a <dbl>, hzcy073a <dbl>, hzcy074a <dbl>, hzcy075a <dbl>, hzcy076a <dbl>, ## # hzcy077a <dbl>, hzcy078a <dbl>, hzcy079a <dbl>, hzcy080a <dbl>, hzcy081a <dbl>, hzcy083a <dbl>, ## # hzcy084a <dbl>, hzcy085a <dbl>, hzcy086a <dbl>, hzcy087a <dbl>, hzcy088a <dbl>, hzcy089a <dbl>, ## # hzcy090a <dbl>, hzcy091a <dbl>, hzcy092a <dbl>, hzcy093a <dbl>, hzcy095a <dbl>, hzcy096a <dbl>, ## # hzcy097a <dbl>, hzcy098a <dbl>, hzcy099a <dbl>, hzza003a <dbl>, hzzq009a <dbl>, hzzq023a <dbl>, ## # hzzp201a <dbl>, ... ``` ] --- ## Converting data into tibbles We can also convert any `data.frame` into a `tibble`: ```r gpc <- as.data.frame(gpc) tibble::as_tibble(gpc) ``` .tiny[ ``` ## # A tibble: 3,765 x 111 ## cohort sex age_cat education_cat intention_to_vote choice_of_party political_orienta~ marstat household ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 2 1 10 3 2 98 7 1 2 ## 2 1 2 2 3 2 5 3 2 2 ## 3 1 1 8 1 2 2 7 1 2 ## 4 2 2 1 3 2 98 1 2 3 ## 5 3 2 7 3 2 5 2 2 2 ## 6 2 2 7 2 2 1 2 1 2 ## 7 1 2 7 3 2 5 3 1 1 ## 8 2 1 7 3 NA NA 3 1 3 ## 9 2 2 8 3 2 98 3 1 2 ## 10 2 2 6 2 NA 98 5 1 2 ## # ... with 3,755 more rows, and 102 more variables: hzcy001a <dbl>, hzcy002a <dbl>, hzcy003a <dbl>, ## # hzcy004a <dbl>, hzcy005a <dbl>, hzcy006a <dbl>, hzcy007a <dbl>, hzcy008a <dbl>, hzcy009a <dbl>, ## # hzcy010a <dbl>, hzcy011a <dbl>, hzcy012a <dbl>, hzcy013a <dbl>, hzcy014a <dbl>, hzcy015a <dbl>, ## # hzcy016a <dbl>, hzcy018a <dbl>, hzcy019a <dbl>, hzcy020a <dbl>, hzcy021a <dbl>, hzcy022a <dbl>, ## # hzcy023a <dbl>, hzcy024a <dbl>, hzcy025a <dbl>, hzcy026a <dbl>, hzcy027a <dbl>, hzcy028a <dbl>, ## # hzcy029a <dbl>, hzcy030a <dbl>, hzcy031a <dbl>, hzcy032a <dbl>, hzcy033a <dbl>, hzcy034a <dbl>, ## # hzcy035a <dbl>, hzcy036a <dbl>, hzcy037a <dbl>, hzcy038a <dbl>, hzcy039a <dbl>, hzcy040a <dbl>, ## # hzcy041a <dbl>, hzcy042a <dbl>, hzcy043a <dbl>, hzcy044a <dbl>, hzcy045a <dbl>, hzcy046a <dbl>, ## # hzcy047a <dbl>, hzcy048a <dbl>, hzcy049a <dbl>, hzcy050a <dbl>, hzcy051a <dbl>, hzcy052a <dbl>, ## # hzcy053a <dbl>, hzcy054a <dbl>, hzcy055a <dbl>, hzcy056a <dbl>, hzcy057a <dbl>, hzcy058a <dbl>, ## # hzcy059a <dbl>, hzcy060a <dbl>, hzcy061a <dbl>, hzcy062a <dbl>, hzcy063a <dbl>, hzcy064a <dbl>, ## # hzcy065a <dbl>, hzcy066a <dbl>, hzcy067a <dbl>, hzcy068a <dbl>, hzcy069a <dbl>, hzcy070a <dbl>, ## # hzcy071a <dbl>, hzcy072a <dbl>, hzcy073a <dbl>, hzcy074a <dbl>, hzcy075a <dbl>, hzcy076a <dbl>, ## # hzcy077a <dbl>, hzcy078a <dbl>, hzcy079a <dbl>, hzcy080a <dbl>, hzcy081a <dbl>, hzcy083a <dbl>, ## # hzcy084a <dbl>, hzcy085a <dbl>, hzcy086a <dbl>, hzcy087a <dbl>, hzcy088a <dbl>, hzcy089a <dbl>, ## # hzcy090a <dbl>, hzcy091a <dbl>, hzcy092a <dbl>, hzcy093a <dbl>, hzcy095a <dbl>, hzcy096a <dbl>, ## # hzcy097a <dbl>, hzcy098a <dbl>, hzcy099a <dbl>, hzza003a <dbl>, hzzq009a <dbl>, hzzq023a <dbl>, ## # hzzp201a <dbl>, ... ``` ] --- ## Pipes everywhere... <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\tidyverse-workshop-esra-2021\content\img\OprahGiveaway.png" width="60%" style="display: block; margin: auto;" /> --- ## The Logic of Pipes Usually, in `R` we apply functions as follows: ```r f(x) ``` In the logic of pipes this function is written as: ```r x %>% f(.) ``` -- We can use pipes on more than one function: ```r x %>% f_1() %>% f_2() %>% f_3() ``` More details: https://r4ds.had.co.nz/pipes.html --- ## Resources There are hundreds of tutorials, courses, blog posts, etc. about the `tidyverse` available online. The book [*R for Data Science*](https://r4ds.had.co.nz/) by [Hadley Wickham](http://hadley.nz/) and [Garrett Grolemund](https://twitter.com/statgarrett) (which is available for free online) provides a very comprehensive introduction to the `tidyverse`. The weekly [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) data projects and the associated [#tidytuesday Twitter hashtag](https://twitter.com/hashtag/tidytuesday?lang=en) are also a fun way of learning and practicing data wrangling and exploration with the `tidyverse`. --- ## Cheat sheets *RStudio* offers a good collection of [cheat sheets for R](https://www.rstudio.com/resources/cheatsheets/). The following two are of particular interest for this workshop: - [Data Import Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/master/data-import.pdf) - [Data Transformation Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf) --- class: center, middle # Any questions so far ❓